In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import GPy
import pods
from IPython.display import display
In [45]:
from matplotlib.patches import Polygon
from matplotlib.collections import PatchCollection
import cPickle as pickle
import urllib
urllib.urlretrieve('http://staffwww.dcs.sheffield.ac.uk/people/M.Zwiessele/gpss/lab2/EastTimor.pickle', 'EastTimor2.pickle')
#Load the data
with open("./EastTimor2.pickle","rb") as f:
X,y,polygons = pickle.load(f)
Now we will create a map of East Timor and, using GPy, plot the data on top of it.
A classification model can be defined in a similar way to the regression model, but now using GPy.models.GPClassification
. However, once we've define the model, we also need to update the approximation to the likelihood. This runs the Expectation propagation updates.
In [46]:
print y.shape
print X.shape
In [47]:
#Visualize a map of East-Timor
fig = plt.figure()
ax = fig.add_subplot(111)
for p in polygons:
ax.add_collection(PatchCollection([Polygon(p)],facecolor="#F4A460"))
ax.set_xlim(124.,127.5)
ax.set_ylim(-9.6,-8.1)
ax.set_xlabel("longitude")
ax.set_ylabel("latitude")
#Define the model
kern = GPy.kern.RBF(2)
model = GPy.models.GPClassification(X,y, kernel=kern)
display(model)
model.plot(ax=ax)
Out[47]:
The decision boundary should be quite poor! However we haven't optimized the model. Try the following:
In [48]:
model.randomize()
model.optimize('bfgs')
print 'Log likelihood={}'.format(model.log_likelihood())
display(model)
fig = plt.figure()
ax = fig.add_subplot(111)
for p in polygons:
ax.add_collection(PatchCollection([Polygon(p)],facecolor="#F4A460"))
ax.set_xlim(124.,127.5)
ax.set_ylim(-9.6,-8.1)
ax.set_xlabel("longitude")
ax.set_ylabel("latitude")
model.plot(ax=ax)
Out[48]:
The optimization is based on the likelihood approximation that was made after we constructed the model. However, because we've now changed the model parameters the quality of that approximation has now probably deteriorated. To improve the model we should iterate between updating the Expectation propagation approximation and optimizing the model parameters.
In [50]:
# Exercise 1 answer here
Yesterday we considered the olympic marathon data. In 1904 we noted there was an outlier example. Today we'll see if we can deal with that outlier by considering a non-Gaussian likelihood. Noise sampled from a Student-$t$ density is heavier tailed than that sampled from a Gaussian. However, it cannot be trivially assimilated into the Gaussian process. Below we use the Laplace approximation to incorporate this noise model.
In [2]:
# redownload the marathon data from yesterday and plot
data = pods.datasets.olympic_marathon_men()
X = data['X']
Y = data['Y']
plt.plot(X, Y, 'bx')
plt.xlabel('year')
plt.ylabel('marathon pace min/km')
Out[2]:
In [4]:
# make a student t likelihood with standard parameters
t_distribution = GPy.likelihoods.StudentT(deg_free=5, sigma2=2)
laplace = GPy.inference.latent_function_inference.Laplace()
kern = GPy.kern.RBF(1, lengthscale=50) + GPy.kern.Matern52(1, lengthscale=10) + GPy.kern.Bias(1)
model = GPy.core.GP(X, Y, kernel=kern, inference_method=laplace, likelihood=t_distribution)
model.constrain_positive('t_noise')
model.optimize()
model.plot()
display(model)
In [ ]:
In [53]:
# Exercise 2 answer